发现采用时间分离技术(TST)的基于模型的重建可以使用C臂锥束计算机断层扫描(CBCT)改善肝脏的动态灌注成像。要使用从CT灌注数据中提取的先验知识应用TST,应从CT扫描中准确分割肝脏。需要对主要和基于模型的CBCT数据进行重建,以正确可视化和解释灌注图。这项研究提出了Turbolift Learning,该学习按照培训CT,CBCT,CBCT,CBCT TST的顺序训练多尺度关注的多尺度注意力,UNET串行序列上的不同肝脏细分任务 - 使先前的培训作为前培训作为预训练阶段的阶段随后的问题 - 解决培训数据集数量有限的问题。对于CBCT TST的肝脏分割的最终任务,提议的方法的总骰子得分为0.874 $ \ pm $ 0.031和0.905 $ \ pm $ \ $ \ $ 0.007,分别为6倍和4倍的交叉验证实验 - 获得统计上显着的改进 - 在模型上,该模型仅接受该任务。实验表明,涡轮增压不仅提高了模型的整体性能,而且还使其与源自栓塞材料和截断物品的人工制品具有稳健性。此外,深入分析确认了分割任务的顺序。本文显示了从CT,CBCT和CBCT TST分割肝脏的潜力,从可用的有限培训数据中学习,将来可能会用于可视化和评估灌注图的肝病评估。 。
translated by 谷歌翻译
CT和MRI是两种广泛使用的临床成像方式,用于非侵入性诊断。然而,这两种方式都有一定的问题。 CT使用有害电离辐射,MRI患有缓慢的采集速度。欠采样可以解决这两个问题,例如稀疏抽样。然而,这种向下采样的数据导致降低分辨率并引入人工制品。已经提出了几种技术,包括基于深度的学习方法,以重建此类数据。然而,这两个方式的欠采样重建问题总是被认为是两个不同的问题,并通过不同的研究工作分开解决。本文通过在径向MRI上应用傅立叶变换的预处理来实现稀疏CT和缺口MRI重建的统一解决方案,然后使用SCOMAGE ups采样与滤波后投影结合使用SCOMAGE Cups采样来实现的基于傅里叶变换的预处理。原始网络是一种基于深度学习的方法,用于重建稀疏采样的CT数据。本文介绍了原始 - 双工UNET,从精度和重建速度方面提高了原始双网络。所提出的方法导致平均SSSIM为0.932,同时对风扇束几何进行稀疏CT重建,其稀疏水平为16,实现了对先前模型的统计上显着的改进,这导致0.919。此外,所提出的模型导致0.903和0.957平均SSIM,同时重建具有16-统计上显着改善的加速因子,在原始模型上重建了缺乏采样的脑和腹部MRI数据,这导致0.867和0.949。最后,本文表明,所提出的网络不仅提高了整体图像质量,而且还提高了兴趣区域的图像质量;以及在针的存在下更好地推广。
translated by 谷歌翻译
Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
translated by 谷歌翻译
In this paper, we present a modular methodology that combines state-of-the-art methods in (stochastic) machine learning with traditional methods in rule learning to provide efficient and scalable algorithms for the classification of vast data sets, while remaining explainable. Apart from evaluating our approach on the common large scale data sets MNIST, Fashion-MNIST and IMDB, we present novel results on explainable classifications of dental bills. The latter case study stems from an industrial collaboration with Allianz Private Krankenversicherungs-Aktiengesellschaft which is an insurance company offering diverse services in Germany.
translated by 谷歌翻译
Key Point Analysis(KPA) is a relatively new task in NLP that combines summarization and classification by extracting argumentative key points (KPs) for a topic from a collection of texts and categorizing their closeness to the different arguments. In our work, we focus on the legal domain and develop methods that identify and extract KPs from premises derived from texts of judgments. The first method is an adaptation to an existing state-of-the-art method, and the two others are new methods that we developed from scratch. We present our methods and examples of their outputs, as well a comparison between them. The full evaluation of our results is done in the matching task -- match between the generated KPs to arguments (premises).
translated by 谷歌翻译
Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
translated by 谷歌翻译
We show how the inherent, but often neglected, properties of large-scale LiDAR point clouds can be exploited for effective self-supervised representation learning. To this end, we design a highly data-efficient feature pre-training backbone that significantly reduces the amount of tedious 3D annotations to train state-of-the-art object detectors. In particular, we propose a Masked AutoEncoder (MAELi) that intuitively utilizes the sparsity of the LiDAR point clouds in both, the encoder and the decoder, during reconstruction. This results in more expressive and useful features, directly applicable to downstream perception tasks, such as 3D object detection for autonomous driving. In a novel reconstruction scheme, MAELi distinguishes between free and occluded space and leverages a new masking strategy which targets the LiDAR's inherent spherical projection. To demonstrate the potential of MAELi, we pre-train one of the most widespread 3D backbones, in an end-to-end fashion and show the merit of our fully unsupervised pre-trained features on several 3D object detection architectures. Given only a tiny fraction of labeled frames to fine-tune such detectors, we achieve significant performance improvements. For example, with only $\sim800$ labeled frames, MAELi features improve a SECOND model by +10.09APH/LEVEL 2 on Waymo Vehicles.
translated by 谷歌翻译
Research connecting text and images has recently seen several breakthroughs, with models like CLIP, DALL-E 2, and Stable Diffusion. However, the connection between text and other visual modalities, such as lidar data, has received less attention, prohibited by the lack of text-lidar datasets. In this work, we propose LidarCLIP, a mapping from automotive point clouds to a pre-existing CLIP embedding space. Using image-lidar pairs, we supervise a point cloud encoder with the image CLIP embeddings, effectively relating text and lidar data with the image domain as an intermediary. We show the effectiveness of LidarCLIP by demonstrating that lidar-based retrieval is generally on par with image-based retrieval, but with complementary strengths and weaknesses. By combining image and lidar features, we improve upon both single-modality methods and enable a targeted search for challenging detection scenarios under adverse sensor conditions. We also use LidarCLIP as a tool to investigate fundamental lidar capabilities through natural language. Finally, we leverage our compatibility with CLIP to explore a range of applications, such as point cloud captioning and lidar-to-image generation, without any additional training. We hope LidarCLIP can inspire future work to dive deeper into connections between text and point cloud understanding. Code and trained models available at https://github.com/atonderski/lidarclip.
translated by 谷歌翻译
The quality of consequences in a decision making problem under (severe) uncertainty must often be compared among different targets (goals, objectives) simultaneously. In addition, the evaluations of a consequence's performance under the various targets often differ in their scale of measurement, classically being either purely ordinal or perfectly cardinal. In this paper, we transfer recent developments from abstract decision theory with incomplete preferential and probabilistic information to this multi-target setting and show how -- by exploiting the (potentially) partial cardinal and partial probabilistic information -- more informative orders for comparing decisions can be given than the Pareto order. We discuss some interesting properties of the proposed orders between decision options and show how they can be concretely computed by linear optimization. We conclude the paper by demonstrating our framework in an artificial (but quite real-world) example in the context of comparing algorithms under different performance measures.
translated by 谷歌翻译
What is a rose, visually? A rose comprises its intrinsics, including the distribution of geometry, texture, and material specific to its object category. With knowledge of these intrinsic properties, we may render roses of different sizes and shapes, in different poses, and under different lighting conditions. In this work, we build a generative model that learns to capture such object intrinsics from a single image, such as a photo of a bouquet. Such an image includes multiple instances of an object type. These instances all share the same intrinsics, but appear different due to a combination of variance within these intrinsics and differences in extrinsic factors, such as pose and illumination. Experiments show that our model successfully learns object intrinsics (distribution of geometry, texture, and material) for a wide range of objects, each from a single Internet image. Our method achieves superior results on multiple downstream tasks, including intrinsic image decomposition, shape and image generation, view synthesis, and relighting.
translated by 谷歌翻译